Using Randomized Quantile Residuals to Assess Model Fit of Generalized Linear Mixed Models



Julia Piaskowski

May 14, 2025

https://some-link-to-talk.html

Linear Mixed Model

\[Y_{ij} = X_i\beta + Z_ja + \epsilon_{ij} \]

\[\epsilon \sim N(0, \sigma^2 \mathbf{I_n}) \] \[a \sim N(0, \sigma_a^2) \]

\[ y_i|a_{j} \sim𝑁(x_i \beta + a_j, \sigma^2) \]

\[ y_i \sim𝑁(x_i \beta, \sigma^2) \]

Standard Residual Plots

Standard Residuals

Raw \[\epsilon_i = Y_i - \hat{Y_i} \]

(internally) Studentized:

\[\epsilon_i = \frac {Y_i - \hat{Y_i}} {\hat{s}}\]

Pearson/Scaled:

\[ \epsilon_i = \frac {Y_i - \hat{Y_i}} {sd(Y_i)} \]

GLMM Formulation (binomial example)

Dependent variable: \(Y_{ij}\) (survived)

survival: \(p_i = Y_i/N\)

\[Y_{ij}|r_j \sim Binomial(N, \pi_{ij})\]

\[\eta_{ij} = \eta + \alpha_i + r_j\] (\(\eta\) = mean, \(\alpha_i\) = treatment, \(r_j\) = grouping variable)

\[\eta_{ij} = log(\frac{\pi_{ij}} {1 - \pi_{ij}} )\]

Standard Residuals from GLMMs Can Be Nonsensical

Raw/Scaled Residuals Can Lack Diagnostic Capabilities for GLMMs

  • Visual patterns are difficult to interpret
  • Over and under-dispersion are difficult to assess
  • Goodness of fit tests are not valid

Different distributional assumptions and mathematical mean/variance relationships of some distributions require a different approach

The Original Quantile Residuals

Background

For \(y_1,...,y_n\) responses:

\[y_i \sim \mathcal{P}(\mu_i, \phi)\]

CDF: \[F(y; \mu, \phi)\]

For a continuous \(F\), \(F(y_i; \mu_i, \phi)\) are uniformly distributed, and the quantile residuals are:

\[ r_{i} = \Phi^{-1} \left\{F(y_i; \hat{\mu_i}, \hat{\phi} )\right\}\]

Example: Survival Times of Patients

Survival: \(y_i \sim Exp(\mu_i)\)

\[ \text{log }\mu_i = \beta_0 + \beta_1 \text{ log } x_i \]

Quantile residuals:

\[ r_{i} = \Phi^{-1} \left\{ 1-exp(y_i/\hat{\mu_i}) \right\}\]

Discontinuous CDF, F

\[r_i = \Phi^{-1}(u_i)\] \[ u_i \sim Uniform(a_i, b_j]\] \[a_i = lim_{y \uparrow y_i} F(y; \hat{\mu_i}, \hat{\phi})\]

\[ b_i = F(y_i; \hat{\mu_i}, \hat{\phi}) \]

Binomial Example

\[y_i \sim binomial(n, p_i) \]

\[ \text{logit} (p_i) = \beta_0 + \beta_1x_i \] \[r_i = \Phi^{-1}(u_i)\]

\[ u_i \sim Uniform(a_i, b_j] \]

\[a_i = lim_{y \uparrow y_i} F(y; n, \hat{p_i}) =\displaystyle\sum_{i=0}^{\lfloor k-1 \rfloor} \begin{pmatrix} n \\ i \end{pmatrix} p_i(1-p)^{n-i}\]

\[b_i = F(y_i; n, \hat{p_i}) =\displaystyle\sum_{i=0}^{\lfloor k \rfloor} \begin{pmatrix} n \\ i \end{pmatrix} p_i(1-p)^{n-i}\]

Method implemented in ‘statmod’

Screenshot of statmod on CRAN

This Method Has Been Around

screenshot of 2003 statmod archive

Randomization is used to produce continuously distributed residuals when the response is discrete or has a discrete component. This means that the quantile residuals will vary from one realization to another for a given data set and fitted model.

–Smythe & Dunn (1996)

DHARMa

(Diagnostics for HierArchical Regression Models)

The DHARMa Process

  1. Model something using a generalized linear model with a chosen distribution and link function.

\[\mathbf{Y = X\beta+Za}\] \[ \mathbf{Y|a\sim G(\mu, R)} \] linear predictor: \(\mathbf{\eta = X\beta}\)

fitted value: \(E(Y) = \mu\)

Link function: \(\eta = g(\mu)\)

The DHARMa Process

For each observation in the data set:

  1. Simulate new observations using fitted value as the mean and model parameters (e.g. shape, scale) for the other distributional parameters.

\[ Y_i \sim G(\hat{\mu_i}, \hat{\phi}) \]

  1. Calculate the quantile for the cumulative distribution function

\[ r_{i} = F(y_i; \hat{\mu_i}, \hat{\phi} )\]

Expectations of the Quantile Residuals

\[ r_{ij} \sim Uniform(0,1) \]

\(r_{ij} = 0\): everything is larger

\(r_{ij} = 1\): everything is smaller

\(r_{ij} = 0.5\): right in the middle

DHARMa runs 250 simulations (per observation) by default, they recommend up to 1000

Poisson Example

\[ (\hat{Y_i} = \lambda = 5,; \quad Y_i = 7 )\]

Quantile Residuals for 1 Observation

Poisson GLMM Example

Create a data set with a random effect (“group”, 10 levels), a covariate (“Enviroinment1”) and a Poisson-distributed response variable (“observedResponse”):

testData = createData(sampleSize = 500)

Analyze correctly and incorrectly:

rightModel <- glmer(observedResponse ~ Environment1 + (1|group) , 
                     family = "poisson", data = testData)

wrongModel <- lmer(observedResponse ~ Environment1 + (1|group) , 
                     data = testData)

‘Standard’ Residuals

Quantile Residuals

Correctly Specified Model

Quantile Residuals

Incorrectly Specified Model

Distributions of Quantile Residuals

Quantile Function of the Standard Normal Distribution

Notes on DHARMa Implementation

  • It depends on the simulation functions build into GLMM package (supported packages: lme4, glmmTMB, …)
  • Commonly, only the last stochastic level (e.g. Poisson) is simulated, conditional on the fitted random effects - basically an omnibus test
  • Residuals can be simulated for individuals predictors and tests can be conducted for individual factors
  • for any kind of autocorrelation, conditional simulations are a must so that plots don’t indicate patterns or the tests fail due to autocorrelation

Conditional and Marginal Models

In Progress

(focus on lme4::glmer() and glmmTMB)

Other Functionality

(Implemented in DHARMa)

  • Uniformity test
  • Quantile location tesy
  • Outlier test
  • Tests for spatial and temporal autocorrelation
  • Test for zero-inflation
  • Over/Under dispersion tests

Best Practices

  • Use quantile residuals because they are a better diagnostic feature for checking model fit
  • Many of the tests are quite useful (but, as always, don’t rely on absolute threshold)
  • Outlier test results should be treated with caution
  • RTFM: read the DHARMa vignette; it is very helpful!
  • ….More

Sources

  • Bates DM, Maechler M, Bolker BM and S Walker (2015). “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.

  • Brooks ME, Kristensen K, van Benthem KJ, Magnusson A, Berg CW, Nielsen A, Skaug HJ, Maechler M and BM Bolker (2017). “glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling.” The R Journal, 9(2), 378-400. doi: 10.32614/RJ-2017-066.

  • Dunn KP, and GK Smyth (1996). Randomized quantile residuals. Journal of Computational and Graphical Statistics 5, 1-10.

  • Feng et al (2017). Randomized quantile residuals: an omnibus model diagnostic tool with unified reference distribution. arXiv https://doi.org/10.48550/arXiv.1708.08527

Sources

  • Gelman A and J Hill (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, New York.

  • Hartig F (2022). DHARMa: Residual Diagnostics for Hierarchical (Multi-Level/Mixed) Regression Models. R package version 0.4.6, https://CRAN.R-project.org/package=DHARMa.

  • Pinheiro JC and DM Bates (2000). Mixed-Effects Models in S and S-PLUS. Springer Verlag, New York.

  • Stroup WW, Ptukhina M and J Garai (2024). Generalize Linear Mixed Models: Modern Concepts, Methods and Applications. CRC Press, Boca Raton.



(G)LMMs are hard - harder than you may think based on what you may have learned in your second statistics class….

–Ben Bolker, GLMM FAQ